Further Meta-Evaluation of Machine Translation
نویسندگان
چکیده
This paper analyzes the translation quality of machine translation systems for 10 language pairs translating between Czech, English, French, German, Hungarian, and Spanish. We report the translation quality of over 30 diverse translation systems based on a large-scale manual evaluation involving hundreds of hours of effort. We use the human judgments of the systems to analyze automatic evaluation metrics for translation quality, and we report the strength of the correlation with human judgments at both the system-level and at the sentence-level. We validate our manual evaluation methodology by measuring intraand inter-annotator agreement, and collecting timing information.
منابع مشابه
The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملAsiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation
This article describes the A Toolkit for Automatic Machine Translation Evaluation and Meta-evaluation, an open framework offering system and metric developers a text interface to a rich repository of metrics and meta-metrics.
متن کامل(Meta-) Evaluation of Machine Translation
This paper evaluates the translation quality of machine translation systems for 8 language pairs: translating French, German, Spanish, and Czech to English and back. We carried out an extensive human evaluation which allowed us not only to rank the different MT systems, but also to perform higher-level analysis of the evaluation process. We measured timing and intraand inter-annotator agreement...
متن کاملEvaluation of Machine Translation with Predictive Metrics beyond BLEU/NIST: CESTA
In this paper, we report on the results of a full-size evaluation campaign of various MT systems. This campaign is novel compared to the classical DARPA/NIST MT evaluation campaigns in the sense that French is the target language, and that it includes an experiment of meta-evaluation of various metrics claiming to better predict different attributes of translation quality. We first describe the...
متن کاملA New Machine Translation Decoder Based on Artificial Immune System
This paper focuses on decoding as main part of statistical machine translation. Decoding is considering as a NPcomplete algorithm that requires intelligent heuristics to and optimum solutions. In order to solve this problem, we proposed a decoder named DAIS based on the meta-heuristic of artificial immune system. The evaluation is performed on two different corpora. The obtained translations sh...
متن کامل